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ABSTRACT 


A  hardwired  digital  processor  based  on  the  Cooley-Tukey 
algorithm  is  presented.  A  laboratory  prototype  has  been  built  with 
two  modes  of  operation.  As  a  cascade  fast  Fourier  transformer,  it 
can  simultaneously  calculate  transforms  of  two  independent,  continuous 
data  streams  at  word  rates  in  excess  of  3  MHz.  As  a  nonrecursive 
digital  filter,  it  can  produce  filter  impulse  responses  of  up  to 
32  points.  The  digital  filter  also  operates  from  data  source  with 
word  rates  exceeding  3  MHz. 

•  The  processor  has  been  integrated  into  a  system  with  other 

signal  processing  components  including  a  small  general  purpose 
computer.  Laboratory  demonstration  of  the  processing  system  as  a 
spectrum  analyzer  and  as  an  adaptive  filter  for  distortion  correction 
is  discussed. 
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SECTION  I 


INTRODUCTION 


This  paper  describes  a  digital  processor  based  on  the  fast 
Fourier  transform  (FFT)  algorithm.  The  paper  begins  by  describing 
the  design  theory  for  a  fast  Fourier  transformer  and  continues  with 
the  development  of  two  versions  for  a  cascade  FFT  each  capable  of 
transforming  simultaneously  two  independent  data  blocks.  The 
cascade  organization  permits  designing  FFT  processors  for  high  data 
rates. 

The  hardware  design  of  a  laboratory  prototype  is  then  discussed. 
The  resulting  processor  has  two  modes  of  operation:  As  a  512  point 
FFT,  it  can  simultaneously  transform  two  independent  data  sets.  As 
a  non-synchronized  digital  filter,  it  can  be  used  in  applications 
requiring  up  to  32  points  of  impulse  response.  Both  modes  operate 
at  word  rates  in  excess  of  3  MHz,  making  possible  their  use  as  real 
time,  wideband  processors. 

Also  covered  is  the  integration  of  the  laboratory  prototype 
with  other  signal  processing  components  into  a  flexible  laboratory 
system  with  real  time  spectrum  analysis  and  adaptive  filtering 
capabilities  on  bandwidths  approaching  3  MHz. 
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SECTION  II 


THE  ALGORITHM 


The  Fast  Fourier  Transform  (FFT)  algorithm  provides  an 
efficient  procedure  for  calculating  the  discrete  Fourier  transform 
(DFT)  of  a  data  set  containing  N  (=2^)  points  The  algorithm 

requires  a  total  number  of  operations  proportional  to  N  log2  N, 
whereas  the  straightforward  approach  requires  a  number  of  operations 
proportional  to  N2;  for  large  N  the  savings  are  significant.  The 
algorithm  can  be  generalized  to  a  radix  other  than  2,  but  this 
discussion  will  be  confined  only  to  transforms  using  radix  2. 

The  object  is  to  calculate  the  discrete  Fourier  transform  of  the 

series  x  ,  . . .  x.T  -  . 

o  1  N-l 
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The  decimation  in  frequency  form  of  the  algorithm  is  considered 

here,  A  similar  development  using  decimation  in  time  is  possible. 

The  flow  chart  for  an  8  point  transform  using  decimation  in  frequency 

(2) 

(Figure  1)  can  be  implemented  by  the  design  shown  in  Figure  2VW.  A 
convenient  property  of  the  decimation  in  frequency  algorithm  is  that 

the  last  k  stages  constitute  a  2k  point  FFT.  Extending  this  design 
for  larger  transform  capability  means  adding  similar  stages  at  the  input. 
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Figure  1.  Flow  Graph  of  8  Point  FFT  for  Decimation  in  Frequency 
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Figure  2.  FFT  Processor  for  N=8.  Inputs  are  in  Natural  Order 


Outputs  are  in  Bit  Reversed  Order. 


* 


Study  of  the  data  flow  for  the  design  of  Figure  2  reveals  that 
all  circuitry  is  idle  half  the  time.  This  fact  is  exploited  to 
advantage  by  rearranging  each  stage  to  have  the  canonic  form  shown 
in  Figure  3,  The  increase  in  complexity  of  the  rearranged  interstage 
,  switch  is  offset  by  the  savings  in  delay  line  storage  and  by  the 
capability  of  computing  two  transforms  simultaneously.  Cascading 
k  stages  of  the  form  shown  in  Figure  3  results  in  a  2  point  dual 
input  FFT.  If  two  continuous  data  streams  are  presented,  one  to  the 
top  input  and  the  other  to  the  bottom  input  of  the  first  interstage 
switch,  the  transforms  of  both  data  streams  will  be  calculated  and 
separated  at  the  output.  The  frequency  points  appear  in  bit 
reversed  order  at  the  output,  half  from  each  input  on  the  top  path, 
the  other  half  on  the  bottom.  Also,  if  the  spectrum  of  the  input 
is  considered  to  be  centered  at  zero  frequency,  all  points  at  the  top 
transform  output  represent  plus  frequencies  while  the  transforms 
points  out  the  bottom  are  the  negative  frequency  components. 

BIT  REVERSED  INPUTS 

Because  of  the  interest  in  the  development  of  digital  filters 
using  FFT  techniques,  it  was  necessary  to  arrive  at  a  companion  FFT 
design  that  accepted  input  data  in  bit  reversed  order.  A  rearrange¬ 
ment  of  the  flow  graph  for  natural  order  inputs  (Figure  2)  allows 
a  bit  reversed  input  order  and  a  natural  output  order. ^  ^ 

A  rearranged  implementation  similar  to  the  previous  one  is 
possible,  and  its  canonic  form  is  shown  in  Figure  4.  Here  data  to 


5 


Figure  3.  kth  Stage  from  Output  -  Canonic  Form  for  Natural  Input  Order. 
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Figure  4.  kth  Stage  from  Input 
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Canonic  Form  for  Bit  Reversed  Input  Order. 


be  transformed  is  input  on  the  top  and  bottom  paths  and  ends  up  in 

serial  format  after  the  final  stage  switch  output.  The  more 

complicated  interstage  switch  and  the  corresponding  decrease  in  delay 

line  storage  have  been  utilized  here,  too.  And  again,  circuitry 

idle  half  the  time  is  captured  by  a  second,  independent  data  input 

k-1  k 

block  offset  in  time  by  2  points.  A  2  point,  dual  input  trans¬ 
form  for  bit  reversed  inputs  is  developed  again  by  cascading  k  stages. 

The  implementation  for  transforming  two  data  sets  each  of 
N(=2  )  points  with  either  FFT  design  is  developed  by  cascading  k 
similar  stages  from  the  proper  canonic  form.  In  either  case  the 
resulting  complexity  is  k  interstage  switches,  k  arithmetic  sections 

(addition-subtraction),  k  multipliers,  N  weighting  coefficients,  and 
3 

N  words  of  delay  line  storage.  Either  design  will  calculate  the 
inverse  DFT  by  using  the  conjugate  Fourier  weights. 

DIGITAL  FILTER 

It  has  been  shown  that  performing  discrete  convolutions  by 

(3) 

transform  methods  can  yield  computational  savings  .  The  inverse 

DFT  of  a  function  formed  by  the  product  of  the  DFT's  of  two  data 
sets  is  the  circular  convolution  of  the  two  data  sets.  Continuous 
digital  filtering  can  be  realized  with  this  technique  if  care  is 
taken  to  make  the  circular  convolution  appear  as  a  normal  aperiodic 
convolution.  This  is  accomplished  by  using  N  point  time  signals 
having  y  data  points  followed  by  zeros.  For  continuous  signals 
this  is  done  by  breaking  the  input  into  two  separate  sets  each 
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containing  blocks  of  y  data  points  padded  out  with  y  zero  points 
N 

and  staggered  by  y  data  points.  This  is  illustrated  in  Figure  5. 

Each  signal  xT  and  xR  can  be  separately  passed  through  a  filter 

N  N 

h^  described  by  y  points  of  impulse  response  (padded  with  y  zero 

points)  and  the  separate  responses  are  added  to  yield  the  result  of 

x^  passed  through  the  same  filter.  Using  transform  techniques  this 

requires  two  separate  transformers,  two  multipliers  to  form  the 

product  of  the  frequency  characteristics,  and  two  separate  inverse 

transformers.  Shown  in  Figure  6  is  such  a  filter.  Note  that  all 

N 

transform  operations  are  N  points  even  though  only  y  data  points 
N 

are  utilized  (the  y  zero  points  must  be  considered).  This  filter 

will  give  the  correct  output  for  the  convolution  of  the  input  x^, 

unlimited  in  extent,  with  the  filter  response  h^,  which  must  be 
N 

limited  to  y  points  to  avoid  the  aliasing  of  the  circular  convolution. 

The  dual  input  cascade  transformers  discussed  in  the  previous 
section  are  ideal  for  the  digital  filtet  implementation  shown  in 
Figure  6.  Advantage  can  now  be  taken  of  the  two  transform  capability. 
If  the  data  x^  is  input  to  the  top  path  of  the  transformer,  and  the 
bottom  path  is  constrained  to  zero,  and  the  switch  is  directed  to 
the  feed  through  position,  then  the  data  blocks,  x^,  and  x^,  of 
Figure  6  are  formed  as  the  two  independent  inputs  to  the  transformer. 
Elimination  of  the  switch  and  delay  line  at  the  input  stage  will  also 
accomplish  the  x„  and  xn  data  blocking. 

When  employed  in  this  fashion,  the  outputs  of  the  cascade  trans- 
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V 

Figure  5.  Data  Blocking  for  Continuous  Filtering 


10 


11 


Figure  6.  Continuous  Filter  Using  FFT 


former  have  the  data  points  of  the  transform  of  followed  by  those 
of  xg.  Two  multipliers  at  this  point  form  the  product  of  these 
transforms  with  the  transform  of  the  filter  response  h^  stored  in  the 
memory. 

The  outputs  of  the  multipliers  become  the  inputs  for  the 

cascade  inverse  transformer  of  the  type  already  described  which 

accepts  bit  reversed  data  inputs.  Now  the  inputs  are  bit  reversed 

frequency  points  and  outputs  natural  ordered  time  points.  The  switch 

at  the  final  stage  output  is  replaced  by  an  adder  which  sums  the 

responses  due  to  the  x  and  xD  inputs.  Thus,  a  digital  filter  is 

formed  which  operates  continuously  on  the  input  signal  x^  with  no 

synchronism  required  between  x^  and  h^.  The  construction  of  a  filter 
N 

h^  described  by  points  of  impulse  response  requires  an  N  point 
dual  input  FFT  (with  modified  input  switch  and  delay  structure), 

2  complex  multipliers,  N  words  of  storage  for  the  transform  of  h^, 
and  an  N  point  inverse  dual  input  FFT  capable  of  accepting  bit 
reversed  inputs  while  outputting  natural  order  time  points.  The 
filter  characteristic  h^  can  be  altered  by  simply  entering  new 
transform  values  into  the  memory. 
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SECTION  III 


THE  IMPLEMENTATION 


The  following  constraints  were  placed  on  the  laboratory  proto¬ 
type  processor  design:  1)  In  addition  to  its  spectrum  analysis 
capabilities,  it  should  have  significant  impact  in  the  area  of 
adaptive  digital  filtering.  2)  It  should  be  developed  from  available 
components.  3)  It  should  be  capable  of  Megahertz  processing  rates. 

4)  It  should  not  be  unwieldy  in  size.  5)  It  should  show  a  high 
degree  of  modularity  with  consideration  for  the  future  advances  in 
large  scale  integration. 

Ten  bit,  two's  complement  arithmetic  is  used  for  all  operations. 
Complex  words  are  described  by  20  bit  rectangular  coordinates. 

Overflow  in  the  arithmetic  is  detected  but  automatic  correction  is 
generally  not  possible  in  the  cascade  processor.  Interstage  scaling, 
as  a  front  panel  option,  is  incorporated  to  permit  optimization  of 
the  dynamic  range  and  to  control  overflows. 

A  typical  stage  for  either  type  of  transformer  discussed  requires 
five  basic  units:  An  interstage  switch,  delay  line  storage,  and 
arithmetic  section,  a  multiplier,  and  Fourier  weighting  coefficient 
•generation  or  storage.  The  complexity  of  each  of  these  items 
depends  basically  on  two  system  parameters:  Operating  speed  and  digital 
word  length.  A  preliminary  design  of  critical  components  indicated 
that,  with  a  10  bit  word  length,  2  to  4  MHz  maximum  operating  rates 
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could  be  expected  from  the  multipliers,  the  longer  delay  line  storage, 
and  weight  storage. 

The  resultant  size  of  nine  stages  (512  transform  points)  for 
the  dual  input,  cascade  FFT  processor  provides  reasonable  size  and 
complexity.  Transform  lengths  of  any  power  of  two  through  512 
points  are  possible  with  this  equipment.  Also,  should  larger  trans¬ 
forms  be  desired,  additional  stages  could  be  added  along  with 
additional  timing  and  control.  A  block  diagram  of  the  nine  stage 
FFT  is  shown  in  Figure  7. 

The  digital  filter  design  consists  of  a  six  stage  dual  input 
FFT,  two  multipliers  and  storage,  and  a  six  stage  dual  input  invetfse 
FFT.  The  six  stage  FFT  is  exactly  the  last  six  stages  of  the  512 
point  dual  input  FFT.  The  six  stage  inverse  FFT  for  bit  reversed 
inputs  is  a  separate  subsystem,  but  the  design  permits  borrowing 
some  computation  modules  from  the  three  unused  stages  of  the  512  point 
FFT.  The  block  diagram  of  this  twelve  stage  digital  filter  is 
shown  in  Figure  8. 

INTERSTAGE  SWITCH 

The  interstage  switch  routes  the  top  and  bottom  outputs  of  one 
stage  to  the  top  and  bottom  inputs  of  the  next  upon  control  from  the 
central  timing.  A  shift  right  option  is  added  to  the  switch  to 
divide  all  inputs  by  two  with  no  rounding.  There  is  a  negative 
roundoff  error  introduced  by  this  scaling,  but  it  was  accepted  to 
avoid  the  use  of  the  additional  components  necessary  to  round  the 
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Figure  7.  Nine  Stage  -  512  Point  -  FFT 


12  Stage  Digital  Filter 


J* 


Figure  8. 


answer.  This  scaling  option  does  not  compensate  for  all  overflow 

(4) 

possibilities  in  a  stage  .  The  implementation  for  a  10  bit  word 
is  carried  out  with  standard  TTL  multiplexing  devices  on  a  2  1/2 
inch  X  7  inch  circuit  card.  Two  cards  are  needed  per  stage. 

DELAY  LINE  STORAGE 

The  nine  stage  laboratory  prototype  employs  delay  line  lengths 
ranging  from  1  word  to  256  words.  Delay  lines  are  designed  for  10 
bit  wordlerigths  using  available  TTL  and  MOS  devices. 

The  1,  2,  and  4  word  delay  lines  are  constructed  from  a  common 
TTL  D-type  fiip-flop.  The  8,  16,  and  32  word  length  delay  lines 
use  TTL  8  bit  long  shift  registers.  The  shift  rates  and  propagation 
delays  are  substantially  better  than  the  system  requires. 

The  64,  128,  and  256  word  delay  lines  are  developed  from  the 
same  dual  64  bit  dynamic,  P  channel*  MOS  shift  register.  The  two 
phase  clockifig  is  developed  On  the  circuit  board  from  available 
hybrid  clock  drivers.  A  TTL  D-type  input  register  was  added  to 
maintain  the  edge  triggered  input  clocking  compatible  with  the 
TTL  delay  lihiOs.  The  resultant  shift  rates,  limited  by  the  drive 
capabilities  of  the  clocking,  approach  4  MHz.  A  64  word  delay  line 
is  contained  on  a  2  1/2  X  7  inch  circuit  card,  while  the  128  and  256 
word  delay  edCh  occupies  a  5  X  7  inch  power  card.  Power  dissipation 
at  a  3  MHz  shift  rate  measures  less  than  5  milliwatts  per  bit 
including  clocking. 
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ARITHMETIC  SECTION 


The  basic  function  of  this  circuitry  is  to  perform  the  addition 
and  subtraction  of  complex  words  as  shown  in  the  block  diagrams,  and 
to  provide  the  additions  necessary  to  complete  the  complex 
multiplications.  The  implementation  is  carried  out  with  a  combination 
of  TTL  2  and  4-bit  adder  packages,  using  10  bit  two's  complement 
operations.  The  output  of  the  10  bit  addition  is  clocked  into  a 
one  word  delay  register  to  compensate  for  the  input  register  associated 
with  the  multiplier  following  the  subtraction.  Arithmetic  overflow 
is  detected,  but  with  the  cascade  processor  it  can  only  signal  that 
computations  are  in  error.  In  general,  to  correct  for  the  overflow 
the  entire  data  block  must  be  processed  again. 

Overflow  detection  operates  on  the  basis  that  only  when  the 
input  numbers  have  the  same  polarity  can  an  overflow  situation 
exist.  The  overflow  is  then  indicated  by  the  opposite  polarity  in 
the  output  word.  Gating  logic  implements  the  overflow  detection  and 
strobes  it  into  a  latch  where  it  is  stored  until  cleared  by  a  source 
that  is  external  to  the  arithmetic  section. 

A  10  bit  adder  with  overflow  detection  was  made  a  part  of  the 
arithmetic  section,  since  the  formation  of  the  complex  product  by 
the  multiplier  following  the  subtraction  requires  two  additions. 

The  resultant  circuit  is  contained  on  a  2  1/2  X  7  inch  circuit  card 
using  solderless  wrap  interconnections,  and  two  cards  are  used  in 
each  stage. 
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COMPLEX  MULTIPLIERS 

Each  multiplication  for  complex  arithmetic  in  rectangular  co¬ 
ordinates  requires  a  total  of  four  basic  multiplications  and  two 
additions*  Eliminating  the  trivial  case  of  multiplication  by  unity, 
the  nine  stage  FFT  requires  eight  complex  multipliers,  while  the 
digital  filter  needs  twelve. 

Some  hardware  savings  were  made  by  developing  special  purpose, 

hardwired  multipliers  for  those  stages  requiring  only  a  few  weights. 

As  a  result,  the  multiplier  design  was  divided  into  three  separate 

areas:  one  general  purpose  and  two  special  purpose  multipliers. 

One  of  the  hardwired  complex  multipliers  is  configured  to 

_4  JL  4  JL 

multiply  its  input  by  1,  e  J  2  or  e  J  2  upon  command  from  the 

i  - 

timing.  Both  ±  e  J  2  phase  weights  are  necessary  to  supply  the 

Weights  for  calculation  of  either  the  discrete  transform  or  the 

inverse  discrete  transform.  Implementation  is  carried  out  by 

gating  and  multiplexing  logic.  The  other  special  purpose  multiplier 

is  somewhat  more  involved.  The  necessary  coefficients  multiplying 

.  x  .  jr.  .3tt 

the  input  ate  1,  e  ^  4  *  e  ^  2*  and  e  '*4  for  the  discrete  trans- 

a  JL  a  JL  \ 

form,  and  1,  e*’4,e^2,e^  4,  for  the  inverse  transform. 

The  same  technique  for  multiplexing  logic  is  used  with  addition  of 
a  hardwired  multiplier,  which  is  done  by  the  parallel  additions 
of  the  input  word  shifted  as  indicated  by  ones  in  the  binary 

JT 

representation  of  ■j  . 

The  basic  multiplier  used  for  the  rest  of  the  multiplications 
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accepts  a  pair  of  10  bit  two’s  complement  numbers  and  outputs 
their  product  as  a  two's  complemented  number. 

The  multiplier  forms  the  partial  products  in  parallel.  The 
partial  products  are  then  summed  in  a  tree  of  parallel  adders.  The 
necessary  correction  factors  required  in  two's  complement  arithmetic 
when  one  or  both  of  the  numbers  are  negative  are  also  wired  into  the 
hardware. 

Since  10  bit  words  are  desired  throughout  this  system,  only 
the  most  significant  ten  bits  are  accepted  as  outputs.  Care  is  taken 
to  round  this  answer. 

The  implementation  of  the  basic  multiplier  made  maximum  use 
of  TTL  2  and  4  bit  adder  packages  to  perform  the  additions  required 
by  the  word  organized  adder  tree.  The  sixty  integrated  circuits  are 
packaged,  as  shown  in  Figure  9,  on  a  single  5X7  inch  circuit  card. 
Fabrication  of  thirty-two  such  multipliers  was  accomplished  by 
automated  solderless  wrap  techniques. 

Dynamic  testing  of  all  multipliers  was  done  with  the  aid  of  a 

small  general  purpose  computer  which  tested  all  of  the  multiplier- 

20 

multiplicand  possibilities  (2  ) .  A  precise  timing  strobe  was  used 

to  latch  each  answer  and  send  it  back  to  the  computer  for  comparison 
with  the  correct  product.  All  multipliers  were  checked  for  worst 
case  propagation  delays  given  all  input  possibilities.  Worst  case 
multiplication  delays  varied  from  200  to  260  nanoseconds.  Four 
multipliers  plus  the  two  additions  on  the  arithmetic  board  were 
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Figure  9.  Front  and  Back  Views  of  10  Bit  Two's  Complement  Multiplier 


interconnected  with  the  result  that  a  full  complex  multiplication 
is  completed  in  less  than  300  nsec. 

WEIGHT  GENERATORS  AND  MEMORIES 

The  generation  of  the  Fourier  weight  coefficients  takes  on 
several  forms.  No  weights  are  needed  at  the  stages  employing  the 
special  purpose  multipliers,  since  the  weights  are  built  into  the 
hardware.  The  weights  for  the  remaining  stages  are  generated  in  two 
ways:  Read-only-memory  techniques,  and  straightforward  synthesis 
using  combinatorial  logic.  Also  incorporated  at  each  stage  is  the 
capability  of  conjugating  the  weights  to  provide  the  necessary 
coefficients  for  the  complementary  transform. 

The  stages  requiring  8  and  16  distinct  coefficients  are  each 
implemented  using  combinatorial  logic  on  2  1/2  X  7  inch  printed 
circuit  cards.  The  boards,  which  added  the  complex  conjugate 
capability,  are  of  similar  size.  Control  signals  are  provided  as 
3  and  4  bit  address  lines  from  the  central  timing. 

The  weights  for  the  input  four  stages  of  the  FFT  are  stored  in 

/ 

a  braided  transformer  type  read-only-memory  (ROM) .  With  an  access 
time  of  170  nsec  and  a  cycle  time  of  300  nsec,  it  sets  one  of  the 
speed  limitations  in  the  overall  design.  To  conserve  memory  size 
and  also  use  an  available  item,  only  the  first  128  (of  256)  weights 
(phase  angles  0  to  ^  )  for  the  input  stage  are  stored.  The  remaining 
128  are  generated  from  them.  The  final  ROM  unit  contains  10,240  bits 
of  storage. 
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The  input  stage  of  the  companion  design  requires  generation 
of  32  weighted  coefficients,  which  is  accomplished  with  TTL  fusible 
link  type  read-only  memories  on  a  single  21/2X7  inch  circuit  card. 

In  addition  to  those  weights  that  are  needed  for  the  multipliers 
internal  to  the  FFT,  filter  characteristic  weights  are  needed  for  the 
two  multipliers  in  the  digital  filter.  Since  it  is  desired  to  alter 

4  f  *  *  l  , 

the  filter  characteristics  in  the  adaptive  filter  applications,  a 
random  access  memory  is  used.  In  addition  to  its  control  by  the 
central  timing,  it  can  be  accessed  by  an  external  source.  The 
memory  is  organized  for  64  twenty  bit  words.  Writing  into  the 
memory  can  be  done  from  an  external  source  by  random  access  addressing 
or  under  control  of  the  internal  system  timing  for  applications 
involving  correlations.  The  design  utilizes  64  bit  (16  words  4  bits 
per  word),  fully  decoded,  TTL  random  access  memories,  and  the  completed 
memory  with  its  control  logic  was  packed  on  a  7  1/2  X  7  inch  circuit 
board. 

THE  TIMING 

The  overall  system  timing  and  clocking  is  controlled  from  a 
central  timing  unit,  which  basically  consists  of  a  ten  stage  counter 
and  clock  buffers.  Since  all  operations  in  the  nine  stage  FFT  or 
twelve  stage  digital  filter  have  periods  equal  to  a  power  of  two 
number  of  clock  cycles,  the  flip-flops  of  the  counter  essentially 
provide  all  the  control  signals  necessary.  However,  since  each 
transformer  stage  has  an  extra  delay  register  associated  with  each 
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multiplier  and  arithmetic  section  the  controls  required  in  each 
stage  are  out  of  phase  with  all  others.  This  situation  is  compensated 
for  by  incorporating  in  the  timing  an  adder  circuit  for  each  stage 
co  add  or  subtract  the  necessary  phase  offset  needed  in  the  counter 
to  properly  synchronize  the  controls  for  the  system.  A  reference 
for  zero  offset  was  chosen  as  the  output  of  the  FFT.  The  interstage 
switch  control  and  weight  generator  addressing  are  taken  directly 
for  the  adder  outputs. 

All  master  clocking  is  developed  on  the  timing  board  from  the 
external  clock  input  and  distributed  to  the  four  main  sections  of 
the  system.  There  they  are  buffered  further  for  increased  fan  out. 
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SECTION  IV 


SYSTEM  INTEGRATION 


FABRICATION 

The  circuit  boards  described  in  the  previous  section  are  mounted 
planar  fashion  on  frames  capable  of  accepting  up  to  three  5X7  inch 
boards  (Figure  10) .  The  frames  are  mounted  vertically  in  drawers 
which  contain  slots  for  fifteen  frames.  A  solderless  wrap  back¬ 
plane  in  each  drawer  provides  for  interconnection  of  the  frames. 

Access  to  10  bit  words  on  the  backplane  for  monitor  and  test 
capability  is  made  available  by  consistent  organization  of  the  word 
bits,  and  the  use  of  a  special  10  pin  adapter  plug  which  fits  over 
the  Wrap  posts.  The  entire  system  is  housed  in  four  drawers  that 
ate  slide  mounted  in  a  standard  size  rack. 

The  entire  system  is  shown  in  Figure  11.  The  nine  stage  FFT 
is  contained  in  the  top  two  drawers  plus  part  of  the  third  one.  The 
random  access  memory  for  the  filter  characteristic  along  with  the 
two  multipliers  plus  the  input  stage  for  the  companion  design 
complete  the  third  drawer.  The  remaining  five  stages  of  the  companion 
transformer  occupy  the  fourth  drawer.  Six  of  the  pull  out  frames 
that  contain  Computational  units  are  shared,  depending  on  the 
systems  configuration,  between  the  input  three  stages  of  the  nine 
stage  FFT  (drawer  1)  and  the  input  three  stages  of  the  companion 
transformer  (drawers  3  and  A). 
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Figure  10.  Computation  Unit  -  2  Multipliers,  1  Arithmetic  Unit,  1  Interstage  Switch 
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DESIGN  VERIFICATION  TESTS 


The  system  testing  was  done  using  a  simple  recursive  filter 
(Figure  12) ,  which  produces  sinusoidal  outputs  when  impulsed  for  a 
signal  generator  The  output  "frequency"  is  controlled  by  the 

constant  entered  into  the  multiplier,  which  is  a  10  bit  two’s 
complement  device  described  earlier. 

A  computer  simulation  of  the  entire  system  was  employed  to 
provide  a  printout  sheet  of  the  actual  numbers  expected  as  a  result 
of  recursive  filter  inputs.  Using  the  simulation  results,  the  FFT 
and  the  digital  filter  configurations  were  debugged,  and  with  a 
additional  display  aids  dynamic  operation  at  word  rates  in  excess  of 
3  MHz  was  verified. 

THE  RESULTANT  PROCESSOR 

The  integration  of  the  FFT/digital  filter  with  other  components 
has  resulted  in  a  laboratory  system  with  real  time  processing 
capabilities.  The  other  critical  elements  available  for  the  system 
are  a  general  purpose  computer  and  a  digital  multiplex  switch. 
Peripherals  for  the  16  bit  word  computer  included  magnetic  tape, 
disc,  teletype,  line  printer,  and  output  display  screen.  The  digital 
switch  allows  the  interconnection  of  various  laboratory  equipments 
to  be  controlled  by  its  front  panel  switches  or  by  software  from 
the  computer.  System  configuration  changes  are  possible  under 
software  control  at  computer  input-output  (I/O)  rates.  Two  of  the 
computer  I/O  channels  are  connected  to  the  switch — one  for  data 
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Figure  12.  Recursive  Filter 


transfer  and  the  other  for  system  control.  The  devices  connected 
to  the  switch  and  under  its  control  included  analog  to  digital  (A/D) 
converters  (up  to  10  MHz  conversion  rates  available  with  quadrature 
mixing  and  filtering) ,  digital  to  analog  (D/A)  converters  for 
display,  a  1024  word  (10  bits  per  word)  buffer  memory,  plus  the  FFT 
and  recursive  filter. 

Spectrum  Analysis 

The  spectrum  analysis  capabilities  of  this  equipment  include 
both  real  time  and  non  real  time  processing.  When  the  nine  stage 
FFT  is  joined  by  the  high  speed  A/D  converters,  IF  bandwidths 
approaching  3  MHz  can  be  continuously  analyzed  with  512  resolution 
cells.  The  two  transform  capability  is  exploited  by  also  inputting 
data  from  another  pair  of  A/D  converters,  keeping  all  clocking 
common.  The  transformed  data  is  output  at  a  continuous  120  megabits 
per  second,  which  poses  a  futher  processing  problem  depending  on 
the  application.  Insertion  of  the  buffer  memory  in  front  of  the 
FFT  will  provide  analysis  of  IF  bandwidths  approaching  10  MHz,  but 
continuous  analysis  of  the  input  is  then  sacrificed.  More  logically, 
the  buffer  memory  is  located  after  the  FFT  where  it  accepts  selected 
transforms  for  subsequent  entry  into  the  computer.  In  this  mode, 
continuous  coverage  of  IF  bandwidths  up  to  3  MHz  can  be  displayed 
through  the  high  speed  D/A  converters,  and  yet  single  block  transforms 
can  be  sent  to  the  computer  for  storage  or  additional  processing. 

Although  the  FFT  was  neither  designed  for  nor  optimized  for 


the  application,  a  system  was  configured  to  demonstrate  the  FFT  as  a 
special  purpose  processor  for  the  computer.  Data  from  the  computer 
is  transformed  by  the  FFT  and  the  transform  sent  back  to  the  computer. 
The  computer  input-output  rate  generally  limits  the  speed  of  this 

ft  -  • 

operation,  but  considerable  time  savings  are  gained  over  the  cal¬ 
culation  time  in  a  general  purpose  machine,  if  input-output  word 
rates  of  3  MHz  were  available,  processing  rates  of  less  than  lOOysec 

per  512  point  transform  are  possible. 

*  .  * 

As  an  example  of  the  spectrum  analysis  operation,  examine  the 

5 

photographs  in  Figures  13  and  14.  These  were  taken  in  a  non-real 
time  processing  mode  with  the  computer  supplying  the  data  to  be 
transformed. 

Adaptive  Filtering 

The  12  stage  digital  filter  has  strong  potential  in  the  area 
of  adaptive  filtering.  Two  features  make  it  attractive:  the  filter 
characteristic  is  easily  changed  because  of  the  high  speed  digital 

'•A  I 

storage,  and  its  organization  permits  the  filter  to  operate  completely 
asynchronously  with  the  input  signal.  Its  capability  also  includes 

continuous  operation  on  channel  IF  bandwidths  approaching  3  MHz  and 

v  a 

implementation  of  filters  with  up  to  32  points  of  impulse  response. 

The  following  five  part  experiment  was  implemented  to  show  how 

*  <  i'ft, 

the  system  could  be  used  as  an  adaptive  processor.  It  is  desired  to 

V-  ^  V  * 

¥  V-  '  , 

correct  for  the  distortion  resulting  from  the  transmission  of  a 

*  fv 

signal  through  an  unknown  channel.  The  channel  frequency  response 
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Figure  13.  Squared  Magnitude  Output  of  512  Point  FFT  for  Video  Pulse  at  Input  //I 
(Left  Half  of  Display)  and  a  Sinusoid  at  Input  //2  (Right  Half  of  Display) 
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Figure  14.  Squared  Magnitude  Output  of  512  Point  FFT  for  a  Linear  FM  Signal 
at  Input  //I  (Left  Output)  and  No  Signal  at  Input  #2  (Right) 


is  obtained  by  transforming  the  channel  impulse  response.  The 
reciprocal  of  the  frequency  characteristic  is  then  calculated.  This 
reciprocal  characteristic  would  be  an  ideal  correction  filter  if  it 
corresponded  to  a  finite  impulse  response  filter.  Since  in  general 
it  does  not,  it  must  be  transformed  into  the  time  domain,  truncated, 
and  then  transformed  back  into  the  frequency  domain  to  give  the 
desired  characteristic.  The  detailed  steps  of  the  experiment  are 
as  follows.  Also,  see  Figure  15. 

1:  Input  to  the  digital  filter  64  points  of  impulse  response 

e  (k)  from  a  distorted  channel.  Send  the  64  point  DFT, 
s 

Eg(n),  (through  the  buffer  memory  if  necessary)  to  the 

computer,  where  the  reciprocal  function  H  (n)  =  -  /  is 

s  fci  ( n ) 

s 

calculated. 

2:  Enter  the  64  points  of  H  (n)  into  the  filter  characteristic 

s 

memory . 

3:  Impulse  the  digital  filter  to  generate  the  64  point  response 

hs(k). 

4:  Enter  h  (k)  into  the  filter  input  which  blocks  h  (k)  into  two 

s  s 

separate  64  point  functions,  the  first  32  and  the  last  32 

points  of  hg(k)  each  padded  out  with  32  zero  points.  Store 

in  the  filter  memory  the  64  point  transform  H^(n)  of  the  first 

32  points  of  h  (k)  padded  out  with  32  zeros.  The  desired 
s 

filter  characteristic  is  now  stored. 
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Figure  15 


Adaptive  Filter  Experiment 


5. 


Input  e  (k)  into  the  filter  and  the  output  should  approximate 
s 

an  impulse  function. 

Parts  1  through  4  are  the  operations  required  to  adapt  the 
filter  to  a  distorted  channel, and  the  elapsed  time  for  these  parts 
is  dependent  on  the  computer  I/O  rates  and  its  calculation  time 
for  H  (n) .  The  processing  rates  of  the  filter  is  retained  at  the 
3  MHz  capability,  since  the  computer  is  necessary  only  in  adapting 
process. 

For  the  demonstration  experiment  the  computer  served  as  both 
the  controller  as  described  earlier  and  as  the  signal  generator. 


See  Figure  16  for  an  example  of  the  adaptive  processor  in  operation. 
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Figure  16a,  Impulse  Response  of  Distorted  Channel  With  One  Multipath 
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Figure  16b.  Impulse  Response  of  Correcting  Filter  From  Part  3 
of  Adaptive  Filter  Experiment 


Figurfe  16c.  Corrected  Response  From  Part  5  of  Adaptive  Filter  Experiment 
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